Robust Region Feature Synthesizer for Zero-Shot Object Detection
Abstract
Zero-shot object detection aims at incorporating class semantic vectors to realize the detection of (both seen and) unseen classes given an unconstrained test image. In this study, we reveal the core challenges in this research area: how to synthesize robust region features (for unseen objects) that are as intra-class diverse and inter-class separable as the real samples, so that strong unseen object detectors can be trained upon them.
Methodology
To address these challenges, we build a novel zero-shot object detection framework that contains two key components:
1. Intra-class Semantic Diverging (ISD) Component: Used to realize the one-to-more mapping to obtain diverse visual features from each class semantic vector, preventing miss-classifying the real unseen objects as image backgrounds.
2. Inter-class Structure Preserving (ISP) Component: Used to avoid the synthesized features being too scattered to mix up the inter-class and foreground-background relationship.
This framework ensures that the synthesized region features maintain both intra-class diversity and inter-class separability, which are crucial for training robust unseen object detectors.
Experimental Results
To demonstrate the effectiveness of the proposed approach, comprehensive experiments on PASCAL VOC, COCO, and DIOR datasets are conducted.
Key Achievements:
• Our approach achieves the new state-of-the-art performance on PASCAL VOC and COCO datasets.
• This is the first study to carry out zero-shot object detection in remote sensing imagery (DIOR dataset).
The results validate that the proposed robust region feature synthesizer effectively addresses the core challenges in zero-shot object detection, enabling accurate detection of both seen and unseen object classes.